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METHOD AND APPARATUS FOR MOTION ESTIMATION IN IMAGE - SEQUENCES 
WITH EFFICIENT CONTENT-BASED SMOOTHNESS CONSTRAINT 

BACKGROUND OF THE INVENTION 

Field of the Invention 

The invention relates to the image processing of 
motion picture and video sequences for various purposes 
including improving image quality and compression of image 
sequence (e.g., video) data signals. 

Background 

The invention provides enhancements to the process 
of estimating motion in image -sequences such as those that 
originate from motion pictures or television video. The 
invention is applicable to any source of image - sequences . 

Motion in image-sequences is analyzed for various 
reasons. For example, it is a component of various methods 
for image -sequence (e.g., video) quality enhancement, 
generation of interpolated frames between the frames of an 
image -sequence, image -sequence compression, removal of noise 
present in image- sequences , and more. For example, motion 
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estimation can be used to improve images because it allows 
images of different frames to be averaged. Averaging reduces 
noise because images of the same subject taken over and over, 
if averaged, produces a higher quality representation of the 

5 subject than any of the original images. In image - sequences , 
such as video, successive frames are often very similar 
except for the fact that parts of the image are displaced 
relative to their positions in other frames. For example, a 
truck drives by and each frame shows the truck in a slightly 

0 different position. Even though the frames are different, by 
compensating for the motion it is possible to average the 
displaced parts of their images. 

Generating frames between existing frames, for 
example for frame rate conversion, obviously requires motion 

5 estimation, since, if something in an image moves from one 

position to another in successive frames, it should only move 
a fraction of the same distance and direction in the 
intervening frames . 

Motion estimation may be applied to portions of the 

0 image frames making up an image - sequence . That is, the 

frames may be cut up into the same number and shape of parts, 
say squares, and the movement of each part detected from 
frame to frame. In the truck example above, the portion 
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might be a square block from the side of the truck with some 
parts of the owner's logo. The motion estimation process, 
running on a computer, searches in a neighborhood of the part 
of the next (or previous) frame for a block that is closest 
to it (i.e., contains the same parts of the logo as the 
previous or successive frame) . Assuming the truck was moving 
gradually and not too fast, the corresponding block in the 
second frame would be expected to be found in the 
neighborhood of the same location as the block in the first 
frame. In the illustrative example above the blocks are 
chosen to be square, but they could have any shapes, which 
could also be variegated. 

If one considers the source of motion in image- 
sequences, for example the physical movement of various 
subjects relative to a camera (or its equivalent, for example 
in animations) , it is obvious that motion in image -sequences 
can be described as the movement of various blobs of color 
and light on the screen. Further consideration should make 
it clear that the whole assumption that blobs simply move 
around is imperfect because they also rotate, shrink (e.g., 
when an object is gradually hidden), disappear (e.g., scene 
breaks), etc., but it is not necessary to consider where 
motion estimation fails for purposes of understanding the 
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invention. If the motion estimation fails for certain parts 
of an image or certain image- sequences, the motion 
information may simply be ignored and not used for its 
intended purposes. For example, if the goal is quality 
enhancement, the relevant portions may be skipped over and 
the images left untreated or treated in some way that does 
not require motion estimation. 

As the various blobs in an image- sequence may have 
different shapes and may move in different directions and 
speeds, a square block that contains a portion of different 
blobs that are moving differently is not susceptible to 
straightforward motion interpretation. Motion estimation is 
unambiguously successful when a block in a first frame 
substantially matches (looks like) a block in a second image- 
sequence. The process used to discover how a block has moved 
is responsive to whether a block in the second image frame 
matches the block in the first image. If there isn't a good 
match, then the motion estimation may be invalid. The 
estimation of how well blocks in adjacent images match is 
called "correspondence" and the requirement that the match 
reach some level of goodness is called the "correspondence 
constraint . " 
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There is another constraint involved in estimating 
motion of blocks. This constraint stems from the fact that 
it is believed that the motions of the blobs determined 
purely by block matching are not as smooth as they should be. 
5 Thus, if only block-matching were used to predict motion, the 
resulting motion prediction would be overly responsive to 
noise, changes in illumination, complex motion of numerous 
small objects like tree foliage, etc. and therefore fail to 
reflect what would normally be considered the natural motion 

10 desired. To improve the motion estimations for the blocks, 

assuming typical moving blobs are bigger than the block size, 
one may look to adjacent blocks under the assumption that the 
blocks of which moving blobs are made move in unison. Thus, 
in estimating motion, the displacements of neighboring blocks 

15 are taken into account so that neighboring blocks tend to 
move in unison. 

The assumption that neighboring blocks move in 
unison is called a "smoothness constraint," To enforce the 
smoothness constraint, the process of calculation of 

2 0 displacement estimates is implemented such that displacement 
estimates are urged toward the same values for neighboring 
regions. To accomplish this, one may think of calculating a 
single "energy" value that depends on two factors: (a) how 
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well all the displaced regions match corresponding regions on 
the second frame (correspondence) and (b) how well the region 
displacements match those of their respective neighbors 
(smoothness) . The energy value would be large when either 
5 the correspondence or smoothness constraint is poorly 
satisfied and small when they are well satisfied. The 
optimization amounts to calculating all the displacement 
vectors such as to minimize this combined energy value. This 
optimization process can be accomplished by various 

10 computational techniques that are known in the art. 

It should be obvious that the smoothness constraint 
is not applicable for all blocks because, just as blocks 
belonging to differently-moving blobs do not fit the 
correspondence constraint, neighboring blocks belonging to 

15 differently-moving blobs do not fit the smoothness 

constraint. In the prior art, there' are various ways in 
which the smoothness constraint can be relaxed, or permitted 
to be broken, to allow for situations where neighboring 
blocks belong to different blobs. For example, the 

2 0 constraint between blocks may be broken when the blocks are 
apparently from different blobs. This can be done by 
analyzing the image content to identify features that 
indicate when neighboring blocks belong to different blobs. 
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One image processing technique detects edges (abrupt changes 
in color and/or luminance that lie along a line) under the 
assumption that the edge defines a boundary between different 
blobs. When edges are found between blocks, the smoothness 
5 constraint between those blocks is relaxed, or allowed to be 
broken. The assumption underlying the edge-detection 
approach is not always valid, but it can lead to 
improvements . 

There are other quite sophisticated computational 

10 tricks for adjusting the smoothness constraint so that it is 
enforced only where applicable. The more sophisticated of 
these techniques may involve a process called segmentation, 
which identifies separate blobs. These techniques in turn 
use motion estimation, so the process is iterative and, 

15 therefore, takes a great deal of time on a computer. As a 
result, there is a need in the art for techniques for 
modifying the smoothness constraint that are not 
computationally intensive and produce good results. 

2 0 Summary of Prior Art 

To put the above discussion in more precise 
technical terms, the goal of 2D motion estimation is to 
determine how different parts of each image in an image- 
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sequence move from frame to frame. The result is usually 
described by an array of two-dimensional displacement vectors 
d (r) , indicating how a region (e.g., block) r in a current 

image frame has moved to r + d (r) in a following or previous 
5 image frame. For purposes of this discussion, a current 

image frame may be referred to as a "reference frame" and a 
temporally neighboring frame as a "target frame." 

Displacement vectors are defined in sites r e ffi, 

the finite set 9t is a subset of all possible region 
10 positions. Practical methods for motion estimation are based 
on the combination of the two constraints: The correspondence 
constraint and the smoothness constraint. The correspondence 
constraint insures that a region r of a reference image is 
reasonably well mapped to a region r + d (r) in a target 
15 frame. In other words, region r + d (r) in target frame 

should have image properties like texture, luminance, and/or 
color close to those of the region r in the reference frame. 
The details of how the correspondence constraint is designed 
and enforced are not relevant to an understanding of the 
2 0 invention and will not be described further. 

The smoothness constraint is based on the 
assumption that neighboring parts of an image region r 
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frequently move together; that is, they are all described by 
similar motion vectors d (r) . A simple form of smoothness 

constraint may be described by an energy function, which does 
not depend explicitly on image content: 

(1) E 3 = £ r0eSH I neK(rO) f a {\d(r0) - d (rl) I ) , 

where, iV(r) is the spatial neighborhood of site r, and 
function f s is a suitable (preferably, monotonic) function 
that approaches a minimum when its argument decreases to 
zero. To implement the smoothness constraint, the values for 
the displacement vectors d (r) , r e 91, that correspond to 

the lowest possible value of E s are found by any suitable 
computational technique. 

A disadvantage of the above smoothness constraint 
is that it encourages smoothness of displacement vectors that 
may belong to different blobs undergoing different motions. 
The various prior art methods developed to break the 
smoothness constraint between objects are variously based on 
adding some image -content dependent factors to the function 
f 3 . To formulate a good smoothness constraint, the image 
needs to be segmented. Robust image segmentation should, in 
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turn, use motion estimation. This can lead to complex 
computation- intensive recursive processes. Simpler methods 
break image constraint on "edges", defined as connected sites 
of local maxima of the image gradient. This approach 
requires choosing threshold values that differ for different 
image -sequences . 

The invention will be described in connection with 
certain preferred embodiments, so that it may be more fully 
understood. The particulars shown are by way of example and 
for purposes of illustrative discussion of the preferred 
embodiments of the present invention only, and are presented 
in the cause of providing what is believed to be the most 
useful and readily understood description of the principles 
and conceptual aspects of the invention. In this regard, no 
attempt is made to show structural details of the invention 
in more detail than is necessary for a fundamental 
understanding of the invention, the description making 
apparent to those skilled in the art how the several forms of 
the invention may be embodied in practice. 

SUMMARY OF THE INVENTION 
Briefly, motion estimation employs a smoothness 
constraint which is strengthened for reference regions 
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characterized by an image property that is close to that of 
neighboring regions. Preferably, the image property should 
be a normalized figure to account for inherent variability 
distributed over the region. 
5 In prior art methods of smoothing the displacement 

vector field, the smoothness constraint is relaxed, or 
allowed to be broken, based on image content. The proposed 
methods, however, have proven very complex. According to the 
invention, a new form of smoothness constraint, which has low 

10 computational complexity is employed. To describe the method 
simply, a value that defines how well all the displacement 
vectors satisfy both the smoothness constraint and the 
correspondence constraint takes into account an average 
property, such as color, of neighboring regions. The 

15 displacements that are calculated for neighboring regions 

differing greatly in the average property from a given region 
contribute little to the calculated smoothness quality of the 
displacement vector field estimate. In contrast, 
displacements that are calculated for neighboring regions 

2 0 that differ little in the average image property, from the 

given region, contribute greatly to the calculated smoothness 
quality of the displacement field estimate. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

According to an embodiment, the image property used 
for the above method is an average color of the region. The 
problem of calculating a field of displacement vectors that 
satisfies both correspondence and smoothness constraints may 
be expressed in the following way: Find a set of displacement 
vectors d(r) that minimizes a combination (e.g. a linear 
combination) of correspondence energy E c and smoothness energy 
E s : 

(2) min ({d(r)}, r € 9t) (E c + p * E s ) , 

where p is a heuristic that controls the strength of the 
smoothness constraint. Equation (2) is essentially 
equivalent to ones described in B. K. P. Horn and B. G. 
Schunck, "Determining optical flow", Artificial Intelligence, 
Vol. 17, pp. 185-203, 1981, and in A. Murat Tekalp, "Digital 
Video Processing", Prentice-Hall, 1995. ISBN 0131900757. 
Equation (2) is presented here only to explain the relation 
between correspondence and smoothness constraints and their 
role in motion estimation. In general it is not necessary to 
explicitly use two energy terms. For example, in Sergei V. 
Fogel, "The Estimation of Velocity Vector Fields from Time- 
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Varying Image - sequences " , CVGIP: Image Understanding, Vol. 
53, pp. 253-287, 1991, expression (2) was not used, but the 
author operated directly with constraints that logically 
contained correspondence and smoothness components. Equation 
(2) and its alternatives may be solved using variety of 
approaches, for example, by an iterative procedure, 
minimizing total energy (2) for one vector d(r) at a time, or 
by forming a large system of nonlinear equations that 
includes the whole array of displacement vectors from the 
reference image . 

In an embodiment conforming to the form of equation 
(2) , the smoothness component of an energy equation is as 
follows : 



(3) E s = £ r0e JR E rleN(rO) f s (c(r0), C (rl) , 

v(r0), v(rl), d(r0), d(rl)), 

where c (r) and v(r) are functions that represent color and 
color variation, respectively. The c (r) and v (r) functions 
are vector-valued functions having as many components as 
there are color channels in the image-sequence. The c (r) 
function represents average color pixel value of the 
reference image in a neighborhood of a site r ; v(r) 
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represents variation of color in a neighborhood of r; and 
f(cQ, Cl, vO, vl, dO, dl) (using a shorthand notation, cO 
representing c(rO), cl representing c(rl), and so on) is a 
scalar function with the following properties: 

5 

• As CO gets closer to cl, the closeness being measured by 
corresponding components of vO, vl, the sensitivity of f s 
to small changes in dO and dl increases toward a maximum. 

• As the difference between cO and cl significantly exceeds 
10 corresponding components of both vO and vl, f s becomes 

less sensitive to changes in dO and dl. 



To implement the method, the single energy function (2) that 
includes both E s and E c is minimized. The total energy 

15 includes inputs from all reference region displacements dO 
(which is the outer sum in equation (3) and for every 
reference region with displacement dO, for all neighboring 
regions dl (which is the inner sum in equation 3) . Again, 
although the smoothness energy is referred to apart from the 

2 0 correspondence energy, the two need not be separable 

components of a function to be minimized in calculating the 
displacement vector field. In this example embodiment, 

Philips/Patents/701513 

12538.1 14 



Patent 262/023 
US010057 

however, the correspondence energy and smoothness energy form 
a linear combination. 

There are many ways to satisfy the above functional 
requirements. One example is a preferred expression for 
smoothness energy described below. Let each image in an 
image -sequence be defined on n x * n y rectangular grid and 
have n c channels. Images are divided into n b * n h square 
blocks B(r) , where r points to the center of the block. One 
displacement vector d(r) is calculated for each block. The 
resulting set of displacement vectors d (r) form a rectangular 
grid 

Displacement vectors are calculated by minimizing a 
total energy expressed as a sum of correspondence energy E c 
and smoothness energy E s as in equation (2) . Correspondence 
energy E c may be calculated as a sum of terms that describe 
how well pixels in block B(r) at r in the reference image 
correspond to a group of pixels around r + d (r) in the 
target image. The total energy is calculated over all r e 
9t. The exact form of the correspondence energy component is 
not essential to the practice of the present embodiment of 
the invention where the focus is on the contribution of 
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smoothness constraint. Smoothness energy E s is calculated 
using equation (3), where iV(r) is a set of at most eight 
blocks ("at most" for purposes of this illustrative example, 
only) that are the nearest spatial neighbors of block r. 
Functions C (r) and v(r) are vector-valued n c -component 
functions, each component k = 1 , . . . , n c calculated from 
reference image data i (x) within the block B{r) : 

(4) c k (r) = (Lc.B(d ik(x)) / n b \ 

(5) v k (r) = sqrtCCLceBCr) (i k (x) - c k (r)) 2 ) 

/( n h 2 - 1) + ok 2 ) , 

where C£ represents a background variation of the image data 
ik (x) resulting from noise or grain. 

Function f s in (3) then has the following form: 

(6) £ s (c0, cl, vO, vl, dO, dl) = 

exp(-S k (max(0, (c0 k - cl k ) 2 / 
(vO k 2 + vl k 2 ) - 1) ) 2 / 5) * 

IT k (1 - (vO k 2 - vl k 2 ) / 
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Expression (6) satisfies both the requirements for f s , as 
described above. An important feature of the smoothness 
constraint function is that smoothness is encouraged only 
between blocks that have similar color patterns. 

It will be evident to those skilled in the art that 
the invention is not limited to the details of the foregoing 
illustrative embodiments, and that the present invention may 
be embodied in other specific forms without departing from 
the spirit or essential attributes thereof. The present 
embodiments are therefore to be considered in all respects as 
illustrative and not restrictive, the scope of the invention 
being indicated by the appended claims rather than by the 
foregoing description, and all changes which come within the 
meaning and range of equivalency of the claims are therefore 
intended to be embraced therein. 
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